12 research outputs found

    Arabic nested noun compound extraction based on linguistic features and statistical measures

    Get PDF
    The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, statistical methods, or a hybrid of both. A wide range of the existing approaches concentrate on the extraction of the bi-gram or tri-gram noun compound. Nonetheless, extracting a 4-gram or 5-gram nested noun compound is a challenging task due to the morphological, orthographic, syntactic and semantic variations. Many features have an important effect on the efficiency of extracting a noun compound such as unit-hood, contextual information, and term-hood. Hence, there is a need to improve the effectiveness of the Arabic nested noun compound extraction. Thus, this paper proposes a hybrid linguistic approach and a statistical method with a view to enhance the extraction of the Arabic nested noun compound. A number of pre-processing phases are presented, including transformation, tokenization, and normalisation. The linguistic approaches that have been used in this study consist of a part-of-speech tagging and the named entities pattern, whereas the proposed statistical methods that have been used in this study consist of the NC-value, NTC-value, NLC-value, and the combination of these association measures. The proposed methods have demonstrated that the combined association measures have outperformed the NLC-value, NTC-value, and NC-value in terms of nested noun compound extraction by achieving 90%, 88%, 87%, and 81% for bigram, trigram, 4-gram, and 5-gram, respectively

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    AN ENHANCED FEATURE SELECTION METHOD BASED ON GREY WOLF OPTIMIZER FOR CLASSIFICATION PROBLEMS

    No full text
    This research emphasizes mainly on classification, in which every instance in the dataset is classified into its target class depending on the information depicted by its features. However, it is hard to select the suitable features from a set of features, because the search space is generally large, wherein a dataset contains a number of features that comprise redundant and unnecessary features, which leads in-turn to less performance on the classification. Feature selection is considered the best way to solve this issue by picking up only the most applicable features for the classification. In fact, feature selection aims to remove redundant and irrelevant features and build the model more efficiently. Feature selection categorized into two major types: wrapper and filter, this thesis focuses only on wrapper approaches

    Enhanced Weight-Optimized Recurrent Neural Networks Based on Sine Cosine Algorithm for Wave Height Prediction

    No full text
    Constructing offshore and coastal structures with the highest level of stability and lowest cost, as well as the prevention of faulty risk, is the desired plan that stakeholders seek to obtain. The successful construction plans of such projects mostly rely on well-analyzed and modeled metocean data that yield high prediction accuracy for the ocean environmental conditions including waves and wind. Over the past decades, planning and designing coastal projects have been accomplished by traditional static analytic, which requires tremendous efforts and high-cost resources to validate the data and determine the transformation of metocean data conditions. Therefore, the wind plays an essential role in the oceanic atmosphere and contributes to the formation of waves. This paper proposes an enhanced weight-optimized neural network based on Sine Cosine Algorithm (SCA) to accurately predict the wave height. Three neural network models named: Long Short-Term Memory (LSTM), Vanilla Recurrent Neural Network (VRNN), and Gated Recurrent Network (GRU) are enhanced, instead of random weight initialization, SCA generates weight values that are adaptable to the nature of the data and model structure. Besides, a Grid Search (GS) is utilized to automatically find the best models’ configurations. To validate the performance of the proposed models, metocean datasets have been used. The original LSTM, VRNN, and GRU are implemented and used as benchmarking models. The results show that the optimized models outperform the original three benchmarking models in terms of mean squared error (MSE), root mean square error (RMSE), and mean absolute error (MAE)

    A Novel One-Dimensional CNN with Exponential Adaptive Gradients for Air Pollution Index Prediction

    No full text
    Air pollution is one of the world’s most significant challenges. Predicting air pollution is critical for air quality research, as it affects public health. The Air Pollution Index (API) is a convenient tool to describe air quality. Air pollution predictions can provide accurate information on the future pollution situation, effectively controlling air pollution. Governments have expressed growing concern about air pollution due to its global effect on human health and sustainable growth. This paper proposes a novel forecasting model using One-Dimensional Deep Convolutional Neural Network (1D-CNN) and Exponential Adaptive Gradients (EAG) optimization to predict API for a selected location, Klang, a city in Malaysia. The proposed 1D-CNN–EAG exponentially accumulates past model gradients to adaptively tune the learning rate and converge in both convex and non-convex areas. We use hourly air pollution data over three years (January 2012 to December 2014) for training. Parameter optimization and model evaluation was accomplished by a grid-search with k-folds cross-validation. Results have confirmed that the proposed approach achieves better prediction accuracy than the benchmark models in terms of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE) and the Correlation Coefficient (R-Squared) with values of 2.036, 2.354, 4.214 and 0.966, respectively, and time complexity

    Particle Swarm Optimization: A Comprehensive Survey

    No full text
    Particle swarm optimization (PSO) is one of the most well-regarded swarm-based algorithms in the literature. Although the original PSO has shown good optimization performance, it still severely suffers from premature convergence. As a result, many researchers have been modifying it resulting in a large number of PSO variants with either slightly or significantly better performance. Mainly, the standard PSO has been modified by four main strategies: modification of the PSO controlling parameters, hybridizing PSO with other well-known meta-heuristic algorithms such as genetic algorithm (GA) and differential evolution (DE), cooperation and multi-swarm techniques. This paper attempts to provide a comprehensive review of PSO, including the basic concepts of PSO, binary PSO, neighborhood topologies in PSO, recent and historical PSO variants, remarkable engineering applications of PSO, and its drawbacks. Moreover, this paper reviews recent studies that utilize PSO to solve feature selection problems. Finally, eight potential research directions that can help researchers further enhance the performance of PSO are provided

    BAOA: Binary Arithmetic Optimization Algorithm With K-Nearest Neighbor Classifier for Feature Selection

    No full text
    The Arithmetic Optimization Algorithm (AOA) is a recently proposed metaheuristic algorithm that has been shown to perform well in several benchmark tests. The AOA is a metaheuristic that uses the main arithmetic operators’ distribution behavior, such as multiplication, division, subtraction, and addition. This paper proposes a binary version of the Arithmetic Optimization Algorithm (BAOA) to tackle the feature selection problem in classification. The algorithm’s search space is converted from a continuous to a binary one using the sigmoid transfer function to meet the nature of the feature selection task. The classifier uses a method known as the wrapper-based approach K-Nearest Neighbors (KNN), to find the best possible solutions. This study uses 18 benchmark datasets from the University of California, Irvine (UCI) repository to evaluate the suggested binary algorithm’s performance. The results demonstrate that BAOA outperformed the Binary Dragonfly Algorithm (BDF), Binary Particle Swarm Optimization (BPSO), Binary Genetic Algorithm (BGA), and Binary Cat Swarm Optimization (BCAT) when various performance metrics were used, including classification accuracy, selected features as well as the best and worst optimum fitness values

    Machine Learning Models for the Identification of Prognostic and Predictive Cancer Biomarkers: A Systematic Review

    No full text
    The identification of biomarkers plays a crucial role in personalized medicine, both in the clinical and research settings. However, the contrast between predictive and prognostic biomarkers can be challenging due to the overlap between the two. A prognostic biomarker predicts the future outcome of cancer, regardless of treatment, and a predictive biomarker predicts the effectiveness of a therapeutic intervention. Misclassifying a prognostic biomarker as predictive (or vice versa) can have serious financial and personal consequences for patients. To address this issue, various statistical and machine learning approaches have been developed. The aim of this study is to present an in-depth analysis of recent advancements, trends, challenges, and future prospects in biomarker identification. A systematic search was conducted using PubMed to identify relevant studies published between 2017 and 2023. The selected studies were analyzed to better understand the concept of biomarker identification, evaluate machine learning methods, assess the level of research activity, and highlight the application of these methods in cancer research and treatment. Furthermore, existing obstacles and concerns are discussed to identify prospective research areas. We believe that this review will serve as a valuable resource for researchers, providing insights into the methods and approaches used in biomarker discovery and identifying future research opportunities

    Multi-Criteria Energy Management with Preference Induced Load Scheduling Using Grey Wolf Optimizer

    No full text
    Minimizing energy costs while maintaining consumer satisfaction is a very challenging task in a smart home. The contradictory nature of these two objective functions (cost of energy and satisfaction level) requires a multi-objective problem formulation that can offer several trade-off solutions to the consumer. Previous works have individually considered the cost and satisfaction, but there is a lack of research that considers both these objectives simultaneously. Our work proposes an optimum home appliance scheduling method to obtain an optimum satisfaction level with a minimum cost of energy. To achieve this goal, first, an energy management system (EMS) is developed using a rule-based algorithm to reduce the cost of energy by efficient utilization of renewable energy resources and an energy storage system. The second part involves the development of an optimization algorithm for optimal appliance scheduling based on consumer satisfaction level, involving their time and device-based preferences. For that purpose, a multi-objective grey wolf accretive satisfaction algorithm (MGWASA) is developed, with the aim to provide trade-off solutions for optimal load patterns based on cost per unit satisfaction index (Cs_index) and percentage satisfaction (%S). The MGWASA is evaluated for a grid-connected smart home model with EMS. To ensure the accuracy of the numerical simulations, actual climatological data and consumer preferences are considered. The Cs_index is derived for six different cases by simulating (a) optimal load, (b) ideal load, and (c) base (random) load, with and without EMS. The results of MGWASA are benchmarked against other state-of-the-art optimization algorithms, namely, binary non-dominated sorting genetic algorithm-2 (NSGAII), multi-objective binary particle swarm optimization algorithm (MOBPSO), Multi-objective artificial bee colony (MOABC), and multi-objective evolutionary algorithm (MOEA). With the proposed appliance scheduling technique, a % reduction in annual energy cost is achieved. MGWASA yields Cs_index at 0.049with with %S of 97%, in comparison to NSGAII, MOBPSO, MOABC, and MOEA, which yield %S of 95%, 90%, 92%, and 94% at 0.052, 0.048,0.0485, 0.0485, and 0.050$, respectively. Moreover, various related aspects, including energy balance, PV utilization, energy cost, net present cost, and cash payback period, are also analyzed. Lastly, sensitivity analysis is carried out to demonstrate the impact of any future uncertainties on the system inputs

    Classification of Reservoir Recovery Factor for Oil and Gas Reservoirs: A Multi-Objective Feature Selection Approach

    No full text
    The accurate classification of reservoir recovery factor is dampened by irregularities such as noisy and high-dimensional features associated with the reservoir measurements or characterization. These irregularities, especially a larger number of features, make it difficult to perform accurate classification of reservoir recovery factor, as the generated reservoir features are usually heterogeneous. Consequently, it is imperative to select relevant reservoir features while preserving or amplifying reservoir recovery accuracy. This phenomenon can be treated as a multi-objective optimization problem, since there are two conflicting objectives: minimizing the number of measurements and preserving high recovery classification accuracy. In this study, wrapper-based multi-objective feature selection approaches are proposed to estimate the set of Pareto optimal solutions that represents the optimum trade-off between these two objectives. Specifically, three multi-objective optimization algorithms—Non-dominated Sorting Genetic Algorithm II (NSGA-II), Multi-Objective Grey Wolf Optimizer (MOGWO) and Multi-Objective Particle Swarm Optimization (MOPSO)—are investigated in selecting relevant features from the reservoir dataset. To the best of our knowledge, this is the first time multi-objective optimization has been used for reservoir recovery factor classification. The Artificial Neural Network (ANN) classification algorithm is used to evaluate the selected reservoir features. Findings from the experimental results show that the proposed MOGWO-ANN outperforms the other two approaches (MOPSO and NSGA-II) in terms of producing non-dominated solutions with a small subset of features and reduced classification error rate
    corecore